Salesforce Data Cloud Ingestion from Google - Implementation Template

(0 reviews)

Application details

Technical considerations

  • The implementation uses OAuth 2.0 authentication code grant type
  • One instance of the Mule application is deployed per Google Drive
  • Content from Google Drive related to Google Workspace is converted into application/pdf format and sent to Data Cloud
  • Content export for Google Workspace files is restricted to a maximum size of 10 MB
  • Content from Google Drive for non-workspace files is retrieved, optionally encoded as Base64 text, and sent to Data Cloud
  • The /ping endpoint will make an authenticated request to Google Drive
  • Metadata for updated content is sent as a notification to Data Cloud
  • The API uses the GET /change endpoint to retrieve metadata updates instead of relying on the channel for change notifications. This approach is preferred to minimize the number of notifications, such as not sending a notification when a user simply opens a file or folder in Google Drive
  • The Mule application is designed to be stateless

Activity diagrams

The following activity diagrams illustrate the sequence of processing to ingest the unstructured metadata and its content on-demand.

Initial Load/Full Refresh Synchronous

sdc-ingest-google-full-refresh.png

Initial Load/Full Refresh Asynchronous

sdc-ingest-google-full-refresh.png

Incremental Load

sdc-ingest-google-poll-changes.png

Get Content

sdc-ingest-google-retrieve-content.png

Processing logic

The primary handling and orchestration of unstructured metadata ingestion will be implemented in the Salesforce Data Cloud Ingestion from the Google Process API. This process is described in more detail in the following sections.

Initial Load/Full Refresh Synchronous

  1. A user action from the Data Cloud initiates the request for a full refresh of the content metadata
  2. Data Cloud invokes the Mule application without a continuation token to start the process
  3. Mule application receives the request and will:
    • Retrieve the content metadata from Google Drive
    • Transform the results into the Data Cloud format with a continuation token
  4. Data Cloud invokes the Mule application in a loop to handle pagination and retrieve metadata until all the metadata content has been retrieved by using the continuation token provided in a previous response

Initial Load/Full Refresh Asynchronous

  1. Mule application receives a request to perform an asynchronous refresh of all metadata and will:
    • Retrieve the content metadata from Google Drive
    • Transform the results into the required format for the ingestion API
    • Send the transformed data to the ingestion endpoint
  2. Mule application loops to handle pagination and retrieve metadata until all the metadata content has been retrieved by using the continuation token from Google Drive

Incremental Load

  1. Mule application runs a scheduler at a given frequency
  2. Mule application invokes the Get Changes API on the Google Drive API to get changes in metadata from Google Drive
  3. Mule application transforms the changes and pushes them to the Data Cloud Ingestion API

Get Content

  1. Data Cloud initiates the request to retrieve the content
  2. Mule application receives the request to retrieve and stream the content from Google Drive
  3. Mule application will attempt to transcode the file to the preferred mime-type as requested by Data Cloud and as supported by the Google Drive API

Important note: Requesting binary content with the encodeBinaryContent flag set to true will disable streaming due to the nature of the Base64 encoding operation. This may result in request timeouts when attempting to encode very large files.

Success conditions

Upon successful completion, the following conditions will be met:

  • All metadata associated with unstructured content in Google Drive is retrieved and processed
  • Changes to metadata related to unstructured content for a Google Drive are processed in scheduled time intervals and sent to Data Cloud
  • The content on-demand for files stored in Google Drive are retrieved and processed successfully

Reviews

TypeTemplate
OrganizationMuleSoft
Published by
MuleSoft Solutions
Published onNov 21, 2024
Asset overview

Asset versions for 1.0.x

Asset versions
VersionActions
1.0.10
1.0.9